5. Running a constant-pH MD simulation

The protons package extends OpenMM with the capability to alter the configuration of protons in simulation. This allows for sampling over multiple protonation states and prototropic tautomers of amino acid residues and small molcules. These effects have been shown as important in multiple biological systems ([Czodrowski2007a], [Czodrowski2007b], [Steuber2007], [Neeb2014]), and now you can include them in OpenMM simulations.

5.1. Setting up the AmberProtonDrive class

In order to run a simulation containing multiple protonation states and tautomers, you can use AmberProtonDrive. This object is responsible for keeping track of the current system state, and updates the OpenMM context with the correct parameters. It uses an instantaneous Monte Carlo sampling method [Mongan2004] to update implicit solvent systems. Explicit solvent systems can be updated using NCMC [Stern2007], [Nilmeier2011], [Chen2015]. The driver maintains a dictionary of all possible protonation states and tautomers of each residue in the simulation system, and their parameters in the AMBER99 constant-pH force field [Mongan2004].

To instantiate the AmberProtonDrive, you need a prmtop file (loaded using simtk.openmm.app.AmberPrmtopFile), and an OpenMM system (simtk.openmm.openmm.System).

Additionally, a .cpin file (generated by cpinutil.py, which is part of Ambertools) is needed to provide information on amino acid parameters. Please see the Preparing systems for constant-pH simulation section for instructions how to generate input files.

In order to perform NCMC, you need to provide the integrator that you will use for the regular MD part as well. The implementation of NCMC uses a Velocity Verlet integrator, so make sure that your integrator is not timewise incompatible like for instance Leapfrog integrators.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
from protons import AmberProtonDrive
from simtk import unit, openmm
from simtk.openmm import app
from openmmtools.integrators import VelocityVerletIntegrator

# Load a protein system
prmtop = app.AmberPrmtopFile('protein.prmtop')
cpin_filename = 'protein.cpin'

# System settings
pH = 7.4
temperature = 300.0 * unit.kelvin

# Define an integrator for the system
integrator = VelocityVerletIntegrator(1.0 * unit.femtoseconds)

# Create an implicit solvent system from the AMBER prmtop file
system = prmtop.createSystem(implicitSolvent=app.OBC2, nonbondedMethod=app.NoCutoff, constraints=app.HBonds)
# Create the driver that will update the protons on the system and track its state
driver = AmberProtonDrive(system, temperature, pH, prmtop, cpin_filename, integrator, pressure=None, ncmc_steps_per_trial=0, implicit=True)

# Create a simulation object using the correct integrator
simulation = app.Simulation(prmtop.topology, system, driver.compound_integrator, platform)

Next, you will need to provide reference free energies for each residue in solvent.

5.2. Solvent reference state calibrations

In order to simulate a protein or small molecule with multiple protonation states and/or tautomers, it is necessary to calculate the free energy difference between a reference state. This free energy can depend on the temperature, solvent model, pressure, pH and other factors. It is therefore imperative that this is run whenever you’ve changed simulation settings.

Individual states are denoted with a subscript \(k\). We use self-adjusted mixture sampling (SAMS) to calculate \(g_k\), a reference free energy. The \(g_k\) s correct for (electrostatic) force field contribution to the free energy difference between the reference states, so that the populations produced in simulation match what is expected from the pH dependence, or tautomeric populations.

5.2.1. Residues

The package supports automatic \(g_k\) calculations the following residues by default, denoted by the residue name with the max number of protons added. The reference state is taken to be the state of a single capped amino acids in water.

  • Glutamic acid, GL4 (pKa=4.4)
  • Aspartic acid, AS4 (pKa=4.0)
  • Histidine, HIP (pKa delta=6.5, pKa epsilon = 7.1)
  • Tyrosine, TYR (pKa=9.6)
  • Cysteine, CYS (pKa=8.5)
  • Lysine, LYS (pKa=10.4)

To automatically calibrate all amino acids available in a system, one can use the AmberProtonDrive.calibrate() method.

5.2.2. The AmberProtonDrive.calibrate() method

The AmberProtonDrive.calibrate() method will set this up automatically for the settings you have provided.

1
 calibration_results = driver.calibrate()

It will automatically perform a free energy calculation using self-adjusted mixture sampling (SAMS) to calculate the reference free energy for each state \(g_k\). While this is conveniently carried out automatically, this may take quite some time (minutes to 2-hours on a GTX-Titan per unique residue type). We are experimenting with a setup that can perform calibration in parallel so that you can run calibration more efficiently. If you store these results, you can reload them in a subsequent run.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# Pre-calculated values
# temperature = 300.0 * unit.kelvin
# pressure = None
# timestep = 1.0 * unit.femtoseconds
# pH = 7.4
# Amber 99 constant ph residues

calibration_results = {'as4': np.array([3.98027947e-04,  -3.61785292e+01,  -3.98046143e+01,
                                        -3.61467735e+01,  -3.97845096e+01]),
                       'cys': np.array([7.64357397e-02,   1.30386793e+02]),
                       'gl4': np.array([9.99500333e-04,  -5.88268681e+00,  -8.98650420e+00,
                                        -5.87149375e+00,  -8.94086390e+00]),
                       'hip': np.array([2.39229276,   5.38886021,  13.12895206]),
                       'lys': np.array([9.99500333e-04,  -1.70930870e+01]),
                       'tyr': np.array([6.28975142e-03,   1.12467299e+02])}

driver.import_gk_values(calibration_results)

Warning

When reusing calibrated values, you must make sure that you are using the exact same force field, pH and other properties of the system. If you are not sure, we recommend that you rerun the calibration.

For more in depth explanation of the calibration procedure, please see Advanced calibration options.

Now that \(g_k\) values have been calibrated, you are ready to run a simulation.

5.3. Running the simulation

After calibration, you can start running a simulation. Decide on the number of timesteps, and the frequency of updating the residue states. To propagate in regular dynamics, just use simulation.step. The residue states are updated using the AmberProtonDrive.update() method. This method selects new states using a Monte Carlo procedure, and modifies the parameters in your simulation context to reflect the selected states.

1
2
3
4
5
nupdates, mc_frequency = 10000, 6000

for iteration in range(1, nupdates):
    simulation.step(mc_frequency) # MD
    driver.update(simulation.context)  # protonation

In this example, every 6000 steps of molecular dynamics, the residue states are driven once. This gets repeated for a total of 10000 iteration.

5.4. Tracking the simulation

This section and the API still need to be written.

5.5. Basic example

Below is a basic example of how to run a simulation using the AmberProtonDrive without using the calibration API.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
  from simtk import unit, openmm
  from simtk.openmm import app
  from protons import AmberProtonDrive
  import numpy as np
  from openmmtools.integrators import VelocityVerletIntegrator
  from sys import stdout


  # Import one of the standard systems.
  temperature = 300.0 * unit.kelvin
  timestep = 1.0 * unit.femtoseconds
  pH = 7.4

  platform = openmm.Platform.getPlatformByName('CUDA')

  prmtop = app.AmberPrmtopFile('complex.prmtop')
  inpcrd = app.AmberInpcrdFile('complex.inpcrd')
  positions = inpcrd.getPositions()
  topology = prmtop.topology
  cpin_filename = 'complex.cpin'
  integrator = VelocityVerletIntegrator(timestep)

  # Create a system from the AMBER prmtop file
  system = prmtop.createSystem(implicitSolvent=app.OBC2, nonbondedMethod=app.NoCutoff, constraints=app.HBonds)
  # Create the driver that will track the state of the simulation and provides the updating API
  driver = AmberProtonDrive(system, temperature, pH, prmtop, cpin_filename, integrator, pressure=None, ncmc_steps_per_trial=0, implicit=True)

  # Create an OpenMM simulation object as one normally would.
  simulation = app.Simulation(topology, system, driver.compound_integrator, platform)
  simulation.context.setPositions(positions)
  simulation.context.setVelocitiesToTemperature(temperature)

  # pre-equilibrated values.
  # temperature = 300.0 * unit.kelvin
  # pressure = None
  # timestep = 1.0 * unit.femtoseconds
  # pH = 7.4
  # Amber 99 constant ph residues, converged to threshold of 1.e-7

  calibration_results = {'as4': np.array([3.98027947e-04,  -3.61785292e+01,  -3.98046143e+01,
                                          -3.61467735e+01,  -3.97845096e+01]),
                         'cys': np.array([7.64357397e-02,   1.30386793e+02]),
                         'gl4': np.array([9.99500333e-04,  -5.88268681e+00,  -8.98650420e+00,
                                          -5.87149375e+00,  -8.94086390e+00]),
                         'hip': np.array([2.39229276,   5.38886021,  13.12895206]),
                         'lys': np.array([9.99500333e-04,  -1.70930870e+01]),
                         'tyr': np.array([6.28975142e-03,   1.12467299e+02])}

  driver.import_gk_values(calibration_results)

  # 60 ns, 10000 state updates
  niter, mc_frequency = 10000, 6000
  simulation.reporters.append(app.DCDReporter('trajectory.dcd', mc_frequency))

  for iteration in range(1, niter):
      simulation.step(mc_frequency) # MD
      driver.update(simulation.context)  # protonation